Goto

Collaborating Authors

 approximate multiplier


Approximate Multiplier Induced Error Propagation in Deep Neural Networks

Alahakoon, A. M. H. H., Saadat, Hassaan, Jayasinghe, Darshana, Parameswaran, Sri

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) rely heavily on dense arithmetic operations, motivating the use of Approximate Multipliers (AxMs) to reduce energy consumption in hardware accelerators. However, a rigorous mathematical characterization of how AxMs error distributions influence DNN accuracy remains underdeveloped. This work presents an analytical framework that connects the statistical error moments of an AxM to the induced distortion in General Matrix Multiplication (GEMM). Using the Frobenius norm of the resulting error matrix, we derive a closed form expression for practical DNN dimensions that demonstrates the distortion is predominantly governed by the multiplier mean error (bias). To evaluate this model in realistic settings, we incorporate controlled error injection into GEMM and convolution layers and examine its effect on ImageNet scale networks. The predicted distortion correlates strongly with the observed accuracy degradation, and an error configurable AxM case study implemented on an FPGA further confirms the analytical trends. By providing a lightweight alternative to behavioral or hardware level simulations, this framework enables rapid estimation of AxM impact on DNN inference quality.


Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models

Meng, Chang, Burleson, Wayne, De Micheli, Giovanni

arXiv.org Artificial Intelligence

--Approximate multipliers (AppMults) are widely used in deep learning accelerators to reduce their area, delay, and power consumption. However, AppMults introduce arithmetic errors into deep learning models, necessitating a retraining process to recover accuracy. A key step in retraining is computing the gradient of the AppMult, i.e., the partial derivative of the approximate product with respect to each input operand. Existing approaches typically estimate this gradient using that of the accurate multiplier (AccMult), which can lead to suboptimal retraining results. T o address this, we propose two methods to obtain more precise gradients of AppMults. The first, called LUT -2D, characterizes the AppMult gradient with 2-dimensional lookup tables (LUTs), providing fine-grained estimation and achieving the highest retraining accuracy. The second, called LUT -1D, is a compact and more efficient variant that stores gradient values in 1-dimensional LUTs, achieving comparable retraining accuracy with shorter runtime. Experimental results show that on CIF AR-10 with convolutional neural networks, our LUT -2D and LUT -1D methods improve retraining accuracy by 3.83% and 3.72% on average, respectively. On ImageNet with vision transformer models, our LUT -1D method improves retraining accuracy by 23.69% on average, compared to a state-of-the-art retraining framework. Modern artificial intelligence ( AI) technologies excel in a wide range of areas such as natural language processing and computer vision. However, this rapid growth raises serious concerns about power consumption [1]. To achieve energy-efficient deep learning accelerators, researchers have adopted an emerging design paradigm called approximate computing, which reduces power consumption at the cost of errors [2], [3]. Approximate computing is particularly suitable for deep learning accelerators, since they are inherently resilient to errors and noise.


MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators

Leon, Vasileios, Makris, Georgios, Xydis, Sotirios, Pekmestzi, Kiamal, Soudris, Dimitrios

arXiv.org Artificial Intelligence

Nowadays, the rapid growth of Deep Neural Network (DNN) architectures has established them as the defacto approach for providing advanced Machine Learning tasks with excellent accuracy. Targeting low-power DNN computing, this paper examines the interplay of fine-grained error resilience of DNN workloads in collaboration with hardware approximation techniques, to achieve higher levels of energy efficiency. Utilizing the state-of-the-art ROUP approximate multipliers, we systematically explore their fine-grained distribution across the network according to our layer-, filter-, and kernel-level approaches, and examine their impact on accuracy and energy. We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations. The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model, while it provides 2x energy gains with better accuracy versus the state-of-the-art DNN approximations.


Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability

Panteleaki, Aikaterini Maria, Balaskas, Konstantinos, Zervakis, Georgios, Amrouch, Hussam, Anagnostopoulos, Iraklis

arXiv.org Artificial Intelligence

--As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabrication processes. In this work, we propose a carbon-efficient design methodology for 3D DNN accelerators, leveraging approximate computing and genetic algorithm-based design space exploration to optimize Carbon Delay Product (CDP). By integrating area-efficient approximate multipliers into Multiply-Accumulate (MAC) units, our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy. Experimental evaluations across three technology nodes (45nm, 14nm, and 7nm) show that our method reduces embodied carbon by up to 30% with negligible accuracy drop. The rapid growth of Artificial Intelligence (AI) has resulted in the wide adoption of Deep Neural Networks (DNNs) as a fundamental component of modern computing systems. To efficiently support the computational demands of DNNs, specialized hardware accelerators have been developed, offering significant improvements in throughput and energy efficiency. These accelerators have enabled AI deployment across a wide range of environments, from large-scale data centers to resource-constrained edge devices.


Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic Arrays

Siddique, Ayesha, Khalil, Khurram, Hoque, Khaza Anuarul

arXiv.org Artificial Intelligence

Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aimed at identifying the appropriate approximate multipliers to achieve high energy efficiency with a minimum accuracy loss. To address this problem, we present a novel XAI-Gen methodology, which leverages the analytical model of the emerging hardware accelerator (e.g., Google TPU v4) and explainable artificial intelligence (XAI) to precisely identify the non-critical layers for approximation and quickly discover the appropriate approximate multipliers for AxDNN layers. Our results show that XAI-Gen achieves up to 7x lower energy consumption with only 1-2% accuracy loss. We also showcase the effectiveness of the XAI-Gen approach through a neural architecture search (XAI-NAS) case study. Interestingly, XAI-NAS achieves 40\% higher energy efficiency with up to 5x less execution time when compared to the state-of-the-art NAS methods for generating AxDNNs.


TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

Vungarala, Deepak, Elbtity, Mohammed E., Syed, Sumiya, Alam, Sakila, Pandit, Kartik, Ghosh, Arnob, Zand, Ramtin, Angizi, Shaahin

arXiv.org Artificial Intelligence

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scare hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92\% and 96\% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.


Leveraging Highly Approximated Multipliers in DNN Inference

Zervakis, Georgios, Frustaci, Fabio, Spantidi, Ourania, Anagnostopoulos, Iraklis, Amrouch, Hussam, Henkel, Jörg

arXiv.org Artificial Intelligence

Abstract--In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.


Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers

Askarizadeh, Mohammad Javad, Farahmand, Ebrahim, Castro-Godinez, Jorge, Mahani, Ali, Cabrera-Quiros, Laura, Salazar-Garcia, Carlos

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) have advanced in many real-world applications, such as healthcare and autonomous driving. However, their high computational complexity and vulnerability to adversarial attacks are ongoing challenges. In this letter, approximate multipliers are used to explore DNN robustness improvement against adversarial attacks. By uniformly replacing accurate multipliers for state-of-the-art approximate ones in DNN layer models, we explore the DNNs robustness against various adversarial attacks in a feasible time. Results show up to 7% accuracy drop due to approximations when no attack is present while improving robust accuracy up to 10% when attacks applied.


ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers

Pinos, Michal, Sekanina, Lukas, Mrazek, Vojtech

arXiv.org Artificial Intelligence

Integrating the principles of approximate computing into the design of hardware-aware deep neural networks (DNN) has led to DNNs implementations showing good output quality and highly optimized hardware parameters such as low latency or inference energy. In this work, we present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search method called DARTS to exploit approximate multipliers and thus reduce the power consumption of generated neural networks. We showed on the CIFAR-10 data set that the ApproxDARTS is able to perform a complete architecture search within less than $10$ GPU hours and produce competitive convolutional neural networks (CNN) containing approximate multipliers in convolutional layers. For example, ApproxDARTS created a CNN showing an energy consumption reduction of (a) $53.84\%$ in the arithmetic operations of the inference phase compared to the CNN utilizing the native $32$-bit floating-point multipliers and (b) $5.97\%$ compared to the CNN utilizing the exact $8$-bit fixed-point multipliers, in both cases with a negligible accuracy drop. Moreover, the ApproxDARTS is $2.3\times$ faster than a similar but evolutionary algorithm-based method called EvoApproxNAS.


AdAM: Adaptive Fault-Tolerant Approximate Multiplier for Edge DNN Accelerators

Taheri, Mahdi, Cherezova, Natalia, Nazari, Samira, Rafiq, Ahsan, Azarpeyvand, Ali, Ghasempouri, Tara, Daneshtalab, Masoud, Raik, Jaan, Jenihhin, Maksim

arXiv.org Artificial Intelligence

The remainder of the paper is organized as follows. Section The role of Deep Neural Networks (DNNs) in a wide range II summarizes related works, the proposed method is presented of safety-and mission-critical applications (e.g., autonomous in Section III, Section IV provides the experimental setup and driving) is expanding. Therefore, deployment of a DNN accelerator discusses the results, and finally, the work is concluded in requires addressing the trade-off between different Section V. design parameters and reliability [1] [2].